Internet searching using semantic disambiguation and expansion

ABSTRACT

The invention provides a system and a method of searching for information in a database using a query. In the method, it comprises the steps of: disambiguating the query to identify keyword senses associated with the query; disambiguating information in the database according to the keyword senses; indexing the information in the database according to the keyword senses; expanding the keyword senses to include relevant semantic synonyms for the keyword senses to create a list of expanded keyword senses; searching the database to find relevant information for the query using the expanded keyword senses; and providing search results of the included information containing the keyword senses and other semantically related words senses. The system comprises modules which disambiguate queries and information and indexes the information in a database of word senses.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a Continuation of U.S. application Ser. No.10/921,820, filed Aug. 24, 2004, which claims priority from U.S.Provisional Application No. 60/496,681 filed on Aug. 21, 2003. Theentire disclosures of the aforementioned prior applications areincorporated herein by reference.

FIELD OF THE INVENTION

The present invention relates to Internet searching, and moreparticularly to Internet searching using semantic disambiguation andexpansion.

BACKGROUND

When working with large sets of data, such as a database of documents orweb pages on the Internet, the volume of available data can make itdifficult to find information of relevance. Various methods of searchingare used in an attempt to find relevant information in such stores ofinformation. Some of the best known systems are Internet search engines,such as Yahoo (trademark) and Google (trademark) which allow users toperform keyword-based searches. These searches typically involvematching keywords entered by the user with keywords in an index of webpages.

However, existing Internet search methods often produce results that arenot particularly useful. The search may return many results, but only afew or none may be relevant to the user's query. On the other hand, thesearch may return only a small number of results, none of which areprecisely what the user is seeking while having failed to returnpotentially relevant results.

One reason for some difficulties encountered in performing such searchesis the ambiguity of words used in natural language. Specifically,difficulties are often encountered because one word can have severalmeanings. This difficulty has been addressed in the past by using atechnique called word sense disambiguation, which involves changingwords into word senses having specific semantic meanings. For example,the word “bank” could have the sense of “financial institution” attachedto it, or another definition.

U.S. Pat. No. 6,453,315 teaches meaning based information organizationand retrieval. This patent teaches creating a semantic space by alexicon of concepts and relations between concepts. Queries are mappedto meaning differentiators which represent the location of the query andthe semantic space. Searching is accomplished by determining a semanticdifference between differentiators to determine closeness and meaning.This system relies upon the user to refine the search based on themeanings determined by the system or alternatively to navigate throughnodes found in the search results.

As known in the art, the evaluation of the efficiency of informationretrieval is quantified by “precision” and “recall”. Precision isquantified by dividing the number of correct results found in a searchby the total number of results. Recall is quantified by dividing thenumber of correct results found in a search by the total number ofpossible correct results. Perfect (i.e. 100%) recall may be obtainedsimply by returning all possible results, except of course, this willgive very poor precision. Most existing systems strive to balance thecriteria of precision and recall. Increasing recall, for example byproviding more possible results by use of synonyms, can consequentiallyreduce precision. On the other hand, increasing precision by narrowingthe search results, for example by selecting results that match theexact sequence of words in a query, can reduce recall.

There is a need for a query processing system and method which addressesdeficiencies in the prior art.

SUMMARY OF THE INVENTION

According to one aspect of the present invention, there is provided amethod of searching information comprising the steps of disambiguating aquery, disambiguating and indexing information according to keywordsenses, searching the indexed information to find information relevantto the query using keyword senses in the query and other word senseswhich are semantically related to the keyword senses in the query, andreturning search results which include information containing thekeyword senses and other semantically related words senses.

The method may be applied to any database which is indexed usingkeywords. Preferably, the method is applied to a search of the Internet.

The semantic relations may be any logically or syntactically definedtype of association between two words. Examples of such associations aresynonymy, hyponymy etc.

The step of disambiguating the query may include assigning probabilityto word senses. Similarly, the step of disambiguating the informationmay include attaching probabilities to word senses.

The keyword senses used in the method may be coarse groupings of finerword senses.

In a further aspect, a method of searching for information in a databaseusing a query is provided. The method comprising the steps of:disambiguating information in the database according to the keywordsenses; indexing the information in the database according to thekeyword senses; disambiguating the query to identify keyword sensesassociated with the query; expanding the keyword senses to includerelevant semantic relations for the keyword senses to create a list ofexpanded keyword senses; searching the database to find relevantinformation for the query using the expanded keyword senses; andproviding search results of the included information containing thekeyword senses and other semantically related words senses.

In the method, disambiguating the information in the database maycomprise attaching probabilities to keyword senses. The words in theinformation may be indexed with multiple senses and the probability ofthe sense may be stored with it in the index.

In the method, disambiguating the query may comprise assigning aprobability to the keyword senses.

In the method, disambiguating the query to identify specific keywordsenses may further comprise utilizing probabilities of each of saidspecific keyword senses.

In the method, expanding the specific keyword senses may furthercomprise paraphrasing the query by parsing syntactic structures of thespecific keyword sense and identifying additional semanticallyequivalent queries.

In the method, the keyword senses may represent a coarse grouping offine keyword senses.

In another aspect, a system for providing information from a databaseresponsive a query, is provided. The system comprises: a databasecontaining data to be search by the query; an indexing module to createa reference index for the data to be used by the query; a queryprocessing module to apply the query to the database; and adisambiguation module for disambiguating the query to identify keywordsenses associated with the query. In particular for the system: thedisambiguation module disambiguates information in the databaseaccording to the keyword senses; the indexing module indexes theinformation in the database according to the keyword senses; and thequery processing modules expands the keyword senses to include relevantsemantic synonyms for the keyword senses to create a list of expandedkeyword senses, initiates a search of the database to find relevantinformation for the query using the expanded keyword senses; andprovides search results of the include information containing thekeyword senses and other semantically related words senses.

In the system the disambiguation module may assign a probability to thekeyword senses to rank the keyword senses. The words in the informationmay be indexed with multiple senses and the probability of the sense maybe stored with it in the index.

In the system the keyword senses may represent a coarse grouping of finekeyword senses.

The system may also incorporate other functionalities of aspects notedwith the method described above.

In other aspects various combinations of sets and subsets of the aboveaspects are provided.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing and other aspects of the invention will become moreapparent from the following description of specific embodiments thereofand the accompanying drawings which illustrate, by way of example only,the principles of the invention. In the drawings, where like elementsfeature like reference numerals (and wherein individual elements bearunique alphabetical suffixes):

FIG. 1 is a schematic representation of an information retrieval systemproviding word sense disambiguation associated with an embodiment of theinvention;

FIG. 2 is a schematic representation of words and word senses associatedwith the system of FIG. 1;

FIG. 3A is a schematic representation of a representative semanticrelationship or words for with the system of FIG. 1;

FIG. 3B is a diagram of data structures used to represent the semanticrelationships of FIG. 3A for the system of FIG. 1; and

FIG. 4 is a flow diagram of a method performed by the system of FIG. 1using the word senses of FIG. 2 and the semantic relationships of FIG.3A.

DESCRIPTION OF THE EMBODIMENTS

The description which follows, and the embodiments described therein,are provided by way of illustration of an example, or examples, ofparticular embodiments of the principles of the present invention. Theseexamples are provided for the purposes of explanation, and notlimitation, of those principles and of the invention. In thedescription, which follows, like parts are marked throughout thespecification and the drawings with the same respective referencenumerals.

The following terms will be used in the following description, and havethe meanings shown below:

Computer readable storage medium: hardware for storing instructions ordata for a computer. For example, magnetic disks, magnetic tape,optically readable medium such as CD ROMs, and semi-conductor memorysuch as PCMCIA cards. In each case, the medium may take the form of aportable item such as a small disk, floppy diskette, cassette, or it maytake the form of a relatively large or immobile item such as hard diskdrive, solid state memory card, or RAM.

Information: documents, web pages, emails, image descriptions,transcripts, stored text etc. that contain searchable content ofinterest to users, for example, contents related to news articles, newsgroup messages, web logs, etc.

Module: a software or hardware component that performs certain stepsand/or processes; may be implemented in software running on ageneral-purpose processor.

Natural language: a formulation of words intended to be understood by aperson rather than a machine or computer.

Network: an interconnected system of devices configured to communicateover a communication channel using particular protocols. This could be alocal area network, a wide area network, the Internet, or the likeoperating over communication lines or through wireless transmissions.

Query: a list of keywords indicative of desired search results; mayutilize Boolean operators (e.g. “AND”, “OR”); may be expressed innatural language.

Query module: a hardware or software component to process a query.

Search engine: a hardware or software component to provide searchresults regarding information of interest to a user in response to aquery from the user. The search results may be ranked and/or sorted byrelevance.

Referring to FIG. 1, an information retrieval system associated with anembodiment is shown generally by the number 10. The system includes astore of information 12 which is accessible through a network 14. Othermethods of access known in the art may also be used. The store ofinformation 12 may include documents, web pages, databases, and thelike. Preferably, the network 14 is the Internet, and the store ofinformation 12 comprises web pages. When the network 14 is the Internet,the protocols include TCP/IP (Transmission Control Protocol/InternetProtocol). Various clients 16 are connected to the network 14, by a wirein the case of a physical network or through a wireless transmitter andreceiver. Each client 16 includes a network interface as will beunderstood by those skilled in the art. The network 14 provides theclients 16 with access to the content within the store of information12. To enable the clients 16 to find particular information, documents,web pages, or the like within the store of information 12, the system 10is configured to allow the clients 16 to search for information bysubmitting queries. The queries contain at least a list of keywords andmay also have structure in the form of Boolean relationships such as“AND” and “OR.” The queries may also be structured in natural languageas a sentence or question.

The system includes a search engine 20 connected to the network 14 toreceive the queries from the clients 16 to direct them to individualdocuments within the store of information 12. The search engine 20 maybe implemented as dedicated hardware, or as software operating on ageneral purpose processor. The search engine operates to locatedocuments within the store of information 12 that are relevant to thequery from the client.

The search engine 20 generally includes a processor 22. The engine mayalso be connected, either directly thereto, or indirectly over a networkor other such communication means, to a display 24, an interface 26, anda computer readable storage medium 28. The processor 22 is coupled tothe display 24 and to the interface 26, which may comprise user inputdevices such as a keyboard, mouse, or other suitable devices. If thedisplay 24 is touch sensitive, then the display 24 itself can beemployed as the interface 26. The computer readable storage medium 28 iscoupled to the processor 22 for providing instructions to the processor22 to instruct and/or configure processor 22 to perform steps oralgorithms related to the operation of the search engine 20, as furtherexplained below. Portions or all of the computer readable storage medium28 may be physically located outside of the search engine 28 toaccommodate, for example, very large amounts of storage. Persons skilledin the art will appreciate that various forms search engines can be usedwith the present invention.

Optionally, and for greater computational speed, the search engine 20may include multiple processors operating in parallel or any othermulti-processing arrangement. Such use of multiple processors may enablethe search engine 20 to divide tasks among various processors.Furthermore, the multiple processors need not be physically located inthe same place, but rather may be geographically separated andinterconnected over a network as will be understood by those skilled inthe art.

Preferably, the search engine 20 includes a database 30 for storing anindex of word senses and for storing a knowledge base used by searchengine 20. The database 30 stores the index in a structured format toallow computationally efficient storage and retrieval as will beunderstood by those skilled in the art. The database 30 may be updatedby adding additional keyword senses or by referencing existing keywordsenses to additional documents. The database 30 also provides aretrieval capability for determining which documents contain aparticular keyword sense. The database 30 may be divided and stored inmultiple locations for greater efficiency.

According to an embodiment, the search engine 20 includes a word sensedisambiguation module 32 for processing words in an input document or aquery into word senses. A word sense is a given interpretation ascribedto a word, in view of the context of its usage and its neighbouringwords. For example, the word “book” in the sentence “Book me a flight toNew York” is ambiguous, because “book” can be a noun or a verb, eachwith multiple potential meanings. The result of processing of the wordsby the disambiguation module 32 is a disambiguated document ordisambiguated query comprising word senses rather than ambiguous oruninterpreted words. The input document may be any unit of informationin the store of information, or one of the queries received fromclients. The word sense disambiguation module 32 distinguishes betweenword senses for each word in the document or query. The word sensedisambiguation module 32 identifies which specific meaning of the wordis the intended meaning using a wide range of interlinked linguistictechniques to analyze the syntax (e.g. part of speech, grammaticalrelations) and semantics (e.g. logical relations) in context. It may usea knowledge base of word senses which expresses explicit semanticrelationships between word senses to assist in performing thedisambiguation. The knowledge base may include relationships asdescribed below with reference to FIGS. 3A and 3B.

The search engine 20 includes an indexing module 34 for processing adisambiguated document to create the index of keyword senses and storingthe index in the database 30. The index includes an entry for eachkeyword sense relating to the documents in which it may be found. Theindex is preferably sorted and includes an indication of the locationsof each indexed keyword sense. The index module 34 creates the index byprocessing the disambiguated document and adding each keyword sense tothe index. Certain keywords may appear too many times to be usefuland/or may contain very little semantic information, such as “a” or“the”. These keywords may not be indexed.

The search engine 20 also includes a query module 36 for processingqueries received from client 16. The query module 36 is configured toreceive queries and transfer them to the disambiguation module 32 forprocessing. The query module 36 then finds results in the index that arerelevant to the disambiguated query, as described further below. Theresults contain keyword senses semantically related to the word sensesin the disambiguated query. The query module 36 provides the results tothe client. The results may be ranked and/or scored for relevance toassist the client in interpreting them.

Referring to FIG. 2, the relationship between words and word senses isshown generally by the reference 100. As seen in this example, certainwords have multiple senses. Among many other possibilities, the word“bank” may represent: (i) a noun referring to a financial institution;(ii) a noun referring to a river bank; or (iii) a verb referring to anaction to save money. The word sense disambiguation module 32 splits theambiguous word “bank” into less ambiguous word senses for storage in theindex. Similarly, the word “interest” has multiple meanings including:(i) a noun representing an amount of money payable relating to anoutstanding investment or loan; (ii) a noun representing specialattention given to something; or (iii) a noun representing a legal rightin something.

Referring to FIGS. 3A and 3B, example semantic relationships betweenword senses are shown. These semantic relationships are preciselydefined types of associations between two words based on meaning. Therelationships are between word senses, that is specific meanings ofwords.

Specifically in FIG. 3A, for example, a bank (in the sense of a riverbank) is a type of terrain and a bluff (in the sense of a noun meaning aland formation) is also a type of terrain. A bank (in the sense of riverbank) is a type of incline (in the sense of grade of the land). A bankin the sense of a financial institution is synonymous with a “bankingcompany” or a “banking concern.” A bank is also a type of financialinstitution, which is in turn a type of business. A bank (in the senseof financial institution) is related to interest (in the sense of moneypaid on investments) and is also related to a loan (in the sense ofborrowed money) by the generally understood fact that banks pay intereston deposits and charge interest on loans.

It will be understood that there are many other types of semanticrelationships that may be used. Although known in the art, following aresome examples of semantic relationships between words: Words which arein synonymy are words which are synonyms to each other. A hypernym is arelationship where one word represents a whole class of specificinstances. For example “transportation” is a hypernym for a class ofwords including “train”, “chariot”, “dogsled” and “car”, as these wordsprovide specific instances of the class. Meanwhile, a hyponym is arelationship where one word is a member of a class of instances. Fromthe previous list, “train” is a hyponym of the class “transportation”. Ameronym is a relationship where one word is a constituent part of, thesubstance of, or a member of something. For example, for therelationship between “leg” and “knee”, “knee” is a meronym to “leg”, asa knee is a constituent part of a leg. Meanwhile, a holonym arelationship where one word is the whole of which a meronym names apart. From the previous example, “leg” is a holonym to “knee”. Anysemantic relationships that fall into these categories may be used. Inaddition, any known semantic relationships that indicate specificsemantic and syntactic relationships between word senses may be used.

It is known that there are ambiguities in interpretation when strings ofkeywords are provided as queries and that having an expanded list ofkeywords in a query increases the number of results found in the search.The embodiment provides a system and method to identify relevant,disambiguated lists of keywords for a query. Providing such a listdelineated on the sense of words reduces the amount of extraneousinformation that is retrieved. The embodiment expands the query languagewithout obtaining unrelated results due to extra senses of a word. Forexample, expanding the “financial institution” sense of bank will notalso expand the other senses such as “river-bank” or “to save”. Thisallows information management software to identify more precisely theinformation for which a client is looking.

Expanding a query involves using one or both of the following steps:

1. Adding to a disambiguated query keyword sense, any other word and itsassociated senses that are semantically related to the disambiguatedkeyword sense.

2. Paraphrasing the query by parsing its syntactic structure andtransforming it into other semantically equivalent queries. The indexcontains fields that identify semantic dependencies between pairs ofkeyword senses that are derived from the syntactic structure of theinformation. Paraphrasing is a term and concept known in the art.

It will be recognized that the use of word sense disambiguation in asearch addresses the problem of retrieval relevance. Furthermore, usersoften express queries as they would express language. However, since thesame meaning can be described in many different ways, users encounterdifficulties when they do not express a query in the same specificmanner in which the relevant information was initially classified.

For example if the user is seeking information about “Java” the island,and is interested in “holidays” on Java (island), the user would notretrieve useful documents that had been categorized using the keywords“Java” and “vacation”. It will be recognized that the semantic expansionfeature, according to an embodiment, addresses this issue. It has beenrecognized that deriving precise synonyms and sub-concepts for each keyterm in a naturally expressed query increases the volume of relevantretrievals. If this were performed using a thesaurus without word sensedisambiguation, the result could be worsened. For example, semanticallyexpanding the word “Java” without first establishing its precise meaningwould yield a massive and unwieldy result set with results potentiallyselected based on word senses as diverse as “Indonesia” and “computerprogramming”. It will be recognized that the described methods ofinterpreting the meaning of each word and then semantically expandingthat meaning returns a more comprehensive and simultaneously more targetresult set.

Referring to FIG. 3B, to assist in disambiguating such word senses, theembodiment utilizes knowledge base 400 of word senses capturingrelationships of words as described above for FIG. 3A. Knowledge base400 is associated with database 30 and is accessed to assist WSD module32 in performing word sense disambiguation. Knowledge base 400 containsdefinitions of words for each of their word senses and also containsinformation on relations between pairs of word senses. These relationsincludes the definition of the sense and the associated part of speech(noun, verb, etc.), fine sense synonyms, antonyms, hyponyms, meronyms,pertainyms, similar adjectives relations and other relationships knownin the art. While prior art electronic dictionaries and lexicaldatabases, such as WordNet (trademark), have been used in systems,knowledge base 400 provides an enhanced inventory of words andrelations. Knowledge base 400 contains: (i) additional relations betweenword senses, such as the grouping of fine senses into coarse senses, newtypes of inflectional and derivational morphological relations, andother special purpose semantic relations; (ii) large-scale correctionsof errors in data obtained from published sources; and (iii) additionalwords, word senses, and associated relations that are not present inother prior art knowledge bases.

In the embodiment, knowledge base 400 is a generalized graph datastructure and is implemented as a table of nodes 402 and a table of edgerelations 404 associating connecting two nodes. Each is described inturn. In other embodiments, other data structures, such as linked lists,may be used to implement knowledge base 400.

In table 402, each node is an element in a row of table 402. A recordfor each node may have as many as the following fields: an ID field 406,a type field 408 and an annotation field 410. There are two types ofentries in table 402: a word and a word sense definition. For example,the word “bank” in ID field 406A is identified as a word by the “word”entry in type field 408A. Also, exemplary table 402 provides severaldefinitions of words. To catalog the definitions and to distinguishdefinition entries in table 402 from word entries, labels are used toidentify definition entries. For example, entry in ID field 406B islabeled “LABEL001”. A corresponding definition in type field 408Bidentifies the label as a “fine sense” word relationship. Acorresponding entry in annotation filed 410B identifies the label as“Noun. A financial institution”. As such, a “bank” can now be linked tothis word sense definition. Furthermore an entry for the word“brokerage” may also be linked to this word sense definition. Alternateembodiments may use a common word with a suffix attached to it, in orderto facilitate recognition of the word sense definition. For example, analternative label could be “bank/n1”, where the “/n1” suffix identifiesthe label as a noun (n) and the first meaning for that noun. It will beappreciated that other label variations may be used. Other identifiersto identify adjectives, adverbs and others may be used. The entry intype field 408 identifies the type associated with the word. There areseveral types available for a word, including: word, fine sense andcoarse sense. Other types may also be provided. In the embodiment, whenan instance of a word has a fine sense, that instance also has an entryin annotation field 410 to provide further particulars on that instanceof the word.

Edge/Relations table 404 contains records indicating relationshipsbetween two entries in nodes table 402. Table 404 has the followingentries: From node ID column 412, to node ID column 414, type column 416and annotation column 418. Columns 412 and 414 are used to link toentries in table 402 together. Column 416 identifies the type ofrelation that links the two entries. record has the ID of the origin andthe destination node, the type of the relation, and may have annotationsbased on the type. Type of relations include “root word to word”, “wordto fine sense”, “word to coarse sense”, “coarse to fine sense”,“derivation”, “hyponym”, “category”, “pertainym”, “similar”, “has part”.Other relations may also be tracked therein. Entries in annotationcolumn 418 provide a (numeric) key to uniquely identify an edge typegoing from a word node to either a coarse node or fine node for a givenpart-of-speech.

Further detail is now provided on steps performed by the embodiment toperform a search utilizing results from disambiguating a word associatedwith a query. Referring to FIG. 4, a process perform such a search isshown generally by the reference 300. The process may be divided intotwo general stages. The first stage comprises pre-processing theinformation (or a subset of the information) to facilitate the secondstage of responding to a query. In the first stage of pre-processing,each document in the store of information (or a subset of the store ofinformation) is summarized to create the index in the database. At step302, the word sense disambiguation module 32 distinguishes between wordsenses for each word in each document. The word sense disambiguationmodule 32 was defined above.

The search engine then applies the index module to the disambiguatedinformation at step 304 to obtain an index of keyword senses. The indexmodule 34 creates the index by processing the disambiguated document andadding each keyword sense to the index. Certain keywords may appear toomany times to be useful, such as “a” or “the”. Preferably, thesekeywords are not indexed. It will be recognized that this stepeffectively indexes one word as several different word senses. Thisindex of word senses is stored in the database at step 306.

In the second stage of the process, the search engine receives a queryfrom one of the clients at step 308. The query is parsed into its wordcomponents and then each word can be analyzed for its context alone andin context with its neighbouring words. Parsing techniques for stringsof words are known in the art and are not repeated here. The word sensedisambiguation module 32 distinguishes between meanings for each word inthe query at step 310.

In the preferred embodiment, as shown at step 312, using knowledge base400 (FIG. 3B), the search engine expands and paraphrases thedisambiguated query to include keyword senses which are semanticallyrelated to the specific keyword senses in the query. The expansion isperformed on the basis of word sense and accordingly produces a list ofword senses which are related to the meaning of the query. The semanticrelationships may be those described above with reference to FIGS. 3Aand 3B.

The search engine then compares the disambiguated and expanded query toword sense information in the database at step 314. Entries in theknowledge base whose word senses match the keyword senses in the queryare selected to be results. As noted earlier, the knowledge baseincludes a database of indexed documents. The search engine then returnsresults to the client at step 316. In one embodiment, the results may beweighted according to the semantic relationship between the word sensesfound in the results and that of the keywords in the query. Thus, forexample, a result containing a word sense with a synonymous relationshipto the keyword senses in the query may be given a higher weighting ascompared to a result containing word senses with a hyponym relationship.The results may also be weighted by a probability that a keyword sensein the disambiguated query and/or disambiguated document is correct. Theresults may also be weighted by other features of the document or webpage corresponding to the results such as the frequency of the relevantword senses or their location in relation to each other, or othertechniques for ranking results as will be understood by persons skilledin the art.

It will be recognized that the first stage of the process may beperformed as a pre-computation step, prior to interaction with theclients. The second stage could be performed several times withoutrepeating the first stage. The first stage may be performedoccasionally, or at regular intervals to maintain currency of thedatabase. The database could also be updated incrementally by choosingperforming the first stage on subsets of the information, such as newlyadded or modified information.

Although the invention has been described with reference to certainspecific embodiments, various modifications thereof will be apparent tothose skilled in the art without departing from the scope of theinvention as outlined in the claims appended hereto. A person skilled inthe art would have sufficient knowledge of at least one or more of thefollowing disciplines: computer programming, machine learning andcomputational linguistics.

1. A method of searching for information in a database using a query,said method comprising the steps of: a) creating an index containingsaid information comprising: disambiguating information in a store ofinformation, said information comprising documents containing text, toidentify information keyword meanings, said information keyword meaningscomprising meanings of words contained in each respective document andin the context in which said words are used in the respective documents,and indexing said documents in said database according to saidinformation keyword meanings; and, b) processing said query comprising:disambiguating said query to identify query keyword meanings, said querykeyword meanings comprising meanings of words contained in said queryand in the context in which said words are used in the query; searchingthe database to identify matches between the query keyword meanings andthe indexed information keyword meanings, identifying documentsassociated with the matched information keyword meanings, saididentified documents comprising information relevant to said query; and,providing search results comprising the information relevant to saidquery.
 2. The method of claim 1, wherein steps (a) and (b) are conductedindependently of each other.
 3. The method of claim 2, wherein step (b)further comprises automatically expanding said query keyword meanings byidentifying other word meanings that are semantically related to thequery keyword meanings and creating a list of expanded query keywordmeanings, wherein said list comprises the query keyword meanings and theidentified semantically related word meanings and wherein said list ofexpanded keyword meanings is used in the step of searching the database.4. The method of claim 3, wherein said expanding comprises the use of aknowledge base of semantic relationships between word meanings.
 5. Themethod of claim 4, wherein the step of disambiguating the query furthercomprises assigning probability to said query keyword meanings.
 6. Aquery processing system for providing information relevant to the querycomprising: an input means operable to receive said query; an outputmeans operable to provide results responsive to said query; a means foraccessing a store of information, said information comprising documentscontaining text; a disambiguation module for disambiguating words froman input into an output comprising disambiguated keyword meanings, thedisambiguation module being operable on words in the documents and thequery whereby words from the documents are disambiguated intoinformation keyword meanings and words from the query are disambiguatedinto query keyword meanings; said disambiguation module being operable:(i) to generate information keyword meanings comprising meanings ofwords contained in each respective document and in the context in whichsaid words are used in the respective documents and (ii) to generatequery keyword meanings comprising meanings of words contained in thequery and in the context in which said words are used in the query; anindexing module for indexing the documents based on the informationkeyword meanings and for storing said indexed information in a database;a query processing module for searching the database for matches betweenthe query keyword meanings and the information keyword meanings and forgenerating query results comprising documents associated with thematched information keyword meanings.
 7. The system of claim 6, whereinsaid query processing module is further operable to automatically expandthe query keyword meanings by identifying other word meanings that aresemantically related to the query keyword meanings and to create a listof expanded query keyword meanings.
 8. The system of claim 7, whereinsaid query processing module includes a knowledge base of semanticrelationships between word meanings and wherein said module uses saidknowledge base for expanding the query keyword meanings.
 9. The systemof claim 6, wherein the store of information comprises the Internet andsaid documents comprises web pages.
 10. A method of searching forinformation in a database using a query, said method comprising thesteps of: disambiguating information in said database according tokeyword senses of words; indexing said information in said databaseaccording to said keyword senses; disambiguating said query to identifyspecific keyword senses associated with said query; expanding saidspecific keyword senses to include relevant semantic relations for saidspecific keyword senses to create a list of expanded keyword senses;searching said database to find relevant information for said queryusing said expanded keyword senses; and providing search results of saidinclude information containing the keyword senses and other semanticallyrelated words senses.
 11. The method of claim 10, wherein disambiguatingthe query comprises assigning probability to said keyword senses. 12.The method of claim 11, wherein disambiguating said information in saiddatabase comprises attaching probabilities to keyword senses.
 13. Themethod claim 12, wherein said disambiguating said query to identifyspecific keyword senses further comprises utilizing probabilities ofeach of said specific keyword senses.
 14. The method of claim 13,wherein said expanding said specific keyword senses further comprisesparaphrasing said query by parsing syntactic structures of said specifickeyword sense and identifying additional semantically equivalentqueries.
 15. The method of claim 14, wherein said keyword sensesrepresent a coarse grouping of fine keyword senses.
 16. The method ofclaim 10, wherein said keyword senses represent a coarse grouping offine keyword senses.